AITopics | static scene

Collaborating Authors

static scene

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

REACT3D: Recovering Articulations for Interactive Physical 3D Scenes

Huang, Zhao, Sun, Boyang, Delitzas, Alexandros, Chen, Jiaqi, Pollefeys, Marc

arXiv.org Artificial IntelligenceOct-15-2025

Interactive 3D scenes are increasingly vital for embodied intelligence, yet existing datasets remain limited due to the labor-intensive process of annotating part segmentation, kinematic types, and motion trajectories. We present REACT3D, a scalable zero-shot framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry, enabling direct use in diverse downstream tasks. Our contributions include: (i) openable-object detection and segmentation to extract candidate movable parts from static scenes, (ii) articulation estimation that infers joint types and motion parameters, (iii) hidden-geometry completion followed by interactive object assembly, and (iv) interactive scene integration in widely supported formats to ensure compatibility with standard simulation platforms. We achieve state-of-the-art performance on detection/segmentation and articulation metrics across diverse indoor scenes, demonstrating the effectiveness of our framework and providing a practical foundation for scalable interactive scene generation, thereby lowering the barrier to large-scale research on articulated scene understanding. Our project page is https://react3d.github.io/

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.1134

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

Event-guided 3D Gaussian Splatting for Dynamic Human and Scene Reconstruction

Yin, Xiaoting, Shi, Hao, Yang, Kailun, Zhai, Jiajun, Guo, Shangwei, Wang, Lin, Wang, Kaiwei

arXiv.org Artificial IntelligenceSep-24-2025

Reconstructing dynamic humans together with static scenes from monocular videos remains difficult, especially under fast motion, where RGB frames suffer from motion blur. Event cameras exhibit distinct advantages, e.g., microsecond temporal resolution, making them a superior sensing choice for dynamic human reconstruction. Accordingly, we present a novel event-guided human-scene reconstruction framework that jointly models human and scene from a single monocular event camera via 3D Gaussian Splatting. Specifically, a unified set of 3D Gaussians carries a learnable semantic attribute; only Gaussians classified as human undergo deformation for animation, while scene Gaussians stay static. To combat blur, we propose an event-guided loss that matches simulated brightness changes between consecutive renderings with the event stream, improving local fidelity in fast-moving regions. Our approach removes the need for external human masks and simplifies managing separate Gaussian sets. On two benchmark datasets, ZJU-MoCap-Blur and MMHPSD-Blur, it delivers state-of-the-art human-scene reconstruction, with notable gains over strong baselines in PSNR/SSIM and reduced LPIPS, especially for high-speed subjects.

artificial intelligence, machine learning, simulation of human behavior, (15 more...)

arXiv.org Artificial Intelligence

2509.18566

Country: Asia > China (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.34)

Add feedback

HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting

Lin, Jingyu, Gu, Jiaqi, Fan, Lubin, Wu, Bojian, Lou, Yujing, Chen, Renjie, Liu, Ligang, Ye, Jieping

arXiv.org Artificial IntelligenceFeb-28-2025

Generating high-quality novel view renderings of 3D Gaussian Splatting (3DGS) in scenes featuring transient objects is challenging. We propose a novel hybrid representation, termed as HybridGS, using 2D Gaussians for transient objects per image and maintaining traditional 3D Gaussians for the whole static scenes. Note that, the 3DGS itself is better suited for modeling static scenes that assume multi-view consistency, but the transient objects appear occasionally and do not adhere to the assumption, thus we model them as planar objects from a single view, represented with 2D Gaussians. Our novel representation decomposes the scene from the perspective of fundamental viewpoint consistency, making it more reasonable. Additionally, we present a novel multi-view regulated supervision method for 3DGS that leverages information from co-visible regions, further enhancing the distinctions between the transients and statics. Then, we propose a straightforward yet effective multi-stage training strategy to ensure robust training and high-quality view synthesis across various settings. Experiments on benchmark datasets show our state-of-the-art performance of novel view synthesis in both indoor and outdoor scenes, even in the presence of distracting elements.

computer vision and pattern recognition, gaussian, representation, (10 more...)

arXiv.org Artificial Intelligence

2412.03844

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

GraspSplats: Efficient Manipulation with 3D Feature Splatting

Ji, Mazeyu, Qiu, Ri-Zhao, Zou, Xueyan, Wang, Xiaolong

arXiv.org Artificial IntelligenceSep-3-2024

The ability for robots to perform efficient and zero-shot grasping of object parts is crucial for practical applications and is becoming prevalent with recent advances in Vision-Language Models (VLMs). To bridge the 2D-to-3D gap for representations to support such a capability, existing methods rely on neural fields (NeRFs) via differentiable rendering or point-based projection methods. However, we demonstrate that NeRFs are inappropriate for scene changes due to their implicitness and point-based methods are inaccurate for part localization without rendering-based optimization. To amend these issues, we propose GraspSplats. Using depth supervision and a novel reference feature computation method, GraspSplats generates high-quality scene representations in under 60 seconds. We further validate the advantages of Gaussian-based representation by showing that the explicit and optimized geometry in GraspSplats is sufficient to natively support (1) real-time grasp sampling and (2) dynamic and articulated object manipulation with point trackers. With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods under diverse task settings. In particular, GraspSplats outperforms NeRF-based methods like F3RM and LERF-TOGO, and 2D detection methods.

graspsplat, manipulation, representation, (13 more...)

arXiv.org Artificial Intelligence

2409.02084

Country:

Africa > Togo (0.26)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Real-time SLAM Pipeline in Dynamics Environment

Fu, Alex, Kong, Lingjie

arXiv.org Artificial IntelligenceMar-3-2023

Inspired by the recent success of application of dense data approach by using ORB-SLAM and RGB-D SLAM, we propose a better pipeline of real-time SLAM in dynamics environment. Different from previous SLAM which can only handle static scenes, we are presenting a solution which use RGB-D SLAM as well as YOLO real-time object detection to segment and remove dynamic scene and then construct static scene 3D. We gathered a dataset which allows us to jointly consider semantics, geometry, and physics and thus enables us to reconstruct the static scene while filtering out all dynamic objects.

artificial intelligence, machine learning, real time system, (20 more...)

arXiv.org Artificial Intelligence

2303.02272

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Architecture > Real Time Systems (0.83)

Add feedback

An Exploration of Neural Radiance Field Scene Reconstruction: Synthetic, Real-world and Dynamic Scenes

Quartey, Benedict, Akbulut, Tuluhan, Mgonzo, Wasiwasi, Yong, Zheng Xin

arXiv.org Artificial IntelligenceOct-21-2022

Traditional NeRF approaches can reconstruct both synthetic This project presents an exploration into 3D scene reconstruction and real-world scenes and new methods like Instant of synthetic and real-world scenes using Neural Neural Graphics Primitives [5] significantly speed up the Radiance Field (NeRF) approaches. We primarily take NeRF training process, however, these methods are limited advantage of the reduction in training and rendering time to scenes with static Objects. D-NeRF (Dynamic NeRF [7]) of neural graphic primitives multi-resolution hash encoding, extends traditional NeRF with time conditioning making it to reconstruct static video game scenes and real-world possible to reconstruct scenes with dynamic objects, however, scenes-comparing and observing reconstruction detail and the implementation of D-NeRF was limited to synthetic limitations. Additionally, we explore dynamic scene reconstruction scenes where ground truth camera parameters exist. Our goal using Neural Radiance Fields for Dynamic is to extend the implementation of D-NeRF to reconstruct Scenes(D-NeRF). Finally, we extend the implementation of real-world scenes with dynamic objects like dancing people. D-NeRF, originally constrained to handle synthetic scenes to also handle real-world dynamic scenes.

artificial intelligence, machine learning, reconstruction, (12 more...)

arXiv.org Artificial Intelligence

2210.12268

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Games > Computer Games (0.37)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Unveiling Unexpected Training Data in Internet Video

Communications of the ACMJul-27-2021, 21:00:03 GMT

During training, the squared L2 error between the clean spectrogram and the predicted spectrogram is used as a loss function to train the network. At inference time, our separation model can be applied to arbitrarily long segments of video and varying numbers of speakers. The latter is achieved by either directly training the model with multiple-input visual streams (one for speaker), or simply by feeding the visual features of the desired speaker to the visual stream. For full details about the architecture and training process, see our full paper.15

computer vision and pattern recognition, depth map, video, (13 more...)

Communications of the ACM

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Israel (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Industry:

Leisure & Entertainment (0.93)
Media > Television (0.68)
Media > Film (0.68)
Media > Photography (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Segmentation of static scenes

Riseman, E. M. | Arbib, M.

ClassicsFeb-1-1977

A wide range of segmentation techniques continues to evolve in the literature on scene analysis. Many of these approaches have been constrained to limited applications or goals. This survey analyzes the complexities encountered in applying these techniques to color images of natural scenes involving complex textured objects. It also explores new ways of using the techniques to overcome some of the problems which are described. An outline of considerations in the development of a general image segmentation system which can provide input to a semantic interpretation process is distributed throughout the paper.

artificial intelligence, segmentation, static scene, (3 more...)

Classics

Genre: Overview (0.88)

Technology: Information Technology > Artificial Intelligence (0.62)

Add feedback